AITopics | sparse data

Collaborating Authors

sparse data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data

Seraghiti, Giovanni, Dubrulle, Kévin, Vandaele, Arnaud, Gillis, Nicolas

arXiv.org Machine LearningApr-1-2026

Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, $X$, by the product of two nonnegative factors, $WH$, where $W$ has $r$ columns and $H$ has $r$ rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when $r=1$, in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of $WH$ associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2603.29715

Country:

Europe > United Kingdom (0.04)
Europe > Belgium (0.04)
North America > United States > Utah (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Sparse clustering via the Deterministic Information Bottleneck algorithm

Costa, Efthymios, Papatsouma, Ioanna, Markos, Angelos

arXiv.org Machine LearningJan-29-2026

Cluster analysis relates to the task of assigning objects into groups which ideally present some desirable characteristics. When a cluster structure is confined to a subset of the feature space, traditional clustering techniques face unprecedented challenges. We present an information-theoretic framework that overcomes the problems associated with sparse data, allowing for joint feature weighting and clustering. Our proposal constitutes a competitive alternative to existing clustering algorithms for sparse data, as demonstrated through simulations on synthetic data. The effectiveness of our method is established by an application on a real-world genomics data set.

artificial intelligence, machine learning, sparse dib, (13 more...)

arXiv.org Machine Learning

2601.20628

Country: Europe > United Kingdom (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Secure Sparse Matrix Multiplications and their Applications to Privacy-Preserving Machine Learning

Damie, Marc, Hahn, Florian, Peter, Andreas, Ramon, Jan

arXiv.org Artificial IntelligenceOct-17-2025

To preserve privacy, multi-party computation (MPC) enables executing Machine Learning (ML) algorithms on secret-shared or encrypted data. However, existing MPC frameworks are not optimized for sparse data. This makes them unsuitable for ML applications involving sparse data, e.g., recommender systems or genomics. Even in plaintext, such applications involve high-dimensional sparse data, that cannot be processed without sparsity-related optimizations due to prohibitively large memory requirements. Since matrix multiplication is central in ML algorithms, we propose MPC algorithms to multiply secret sparse matrices. On the one hand, our algorithms avoid the memory issues of the "dense" data representation of classic secure matrix multiplication algorithms. On the other hand, our algorithms can significantly reduce communication costs (some experiments show a factor 1000) for realistic problem sizes. We validate our algorithms in two ML applications in which existing protocols are impractical. An important question when developing MPC algorithms is what assumptions can be made. In our case, if the number of non-zeros in a row is a sensitive piece of information then a short runtime may reveal that the number of non-zeros is small. Existing approaches make relatively simple assumptions, e.g., that there is a universal upper bound to the number of non-zeros in a row. This often doesn't align with statistical reality, in a lot of sparse datasets the amount of data per instance satisfies a power law. We propose an approach which allows adopting a safe upper bound on the distribution of non-zeros in rows/columns of sparse matrices.

data mining, machine learning, multiplication, (19 more...)

arXiv.org Artificial Intelligence

2510.14894

Country: Europe (1.00)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Quantum Noise Tomography with Physics-Informed Neural Networks

Sulc, Antonin

arXiv.org Artificial IntelligenceSep-16-2025

Characterizing the environmental interactions of quantum systems is a critical bottleneck in the development of robust quantum technologies. Traditional tomographic methods are often data-intensive and struggle with scalability. In this work, we introduce a novel framework for performing Lindblad tomography using Physics-Informed Neural Networks (PINNs). By embedding the Lindblad master equation directly into the neural network's loss function, our approach simultaneously learns the quantum state's evolution and infers the underlying dissipation parameters from sparse, time-series measurement data. Our results show that PINNs can reconstruct both the system dynamics and the functional form of unknown noise parameters, presenting a sample-efficient and scalable solution for quantum device characterization. Ultimately, our method produces a fully-differentiable digital twin of a noisy quantum system by learning its governing master equation.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.11911

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Domain Adaptation and Multi-view Attention for Learnable Landmark Tracking with Sparse Data

Chase, Timothy Jr, Dantu, Karthik

arXiv.org Artificial IntelligenceJul-15-2025

The detection and tracking of celestial surface terrain features are crucial for autonomous spaceflight applications, including Terrain Relative Navigation (TRN), Entry, Descent, and Landing (EDL), hazard analysis, and scientific data collection. Traditional photoclinometry-based pipelines often rely on extensive a priori imaging and offline processing, constrained by the computational limitations of radiation-hardened systems. While historically effective, these approaches typically increase mission costs and duration, operate at low processing rates, and have limited generalization. Recently, learning-based computer vision has gained popularity to enhance spacecraft autonomy and overcome these limitations. While promising, emerging techniques frequently impose computational demands exceeding the capabilities of typical spacecraft hardware for real-time operation and are further challenged by the scarcity of labeled training data for diverse extraterrestrial environments. In this work, we present novel formulations for in-situ landmark tracking via detection and description. We utilize lightweight, computationally efficient neural network architectures designed for real-time execution on current-generation spacecraft flight processors. For landmark detection, we propose improved domain adaptation methods that enable the identification of celestial terrain features with distinct, cheaply acquired training data. Concurrently, for landmark description, we introduce a novel attention alignment formulation that learns robust feature representations that maintain correspondence despite significant landmark viewpoint variations. Together, these contributions form a unified system for landmark tracking that demonstrates superior performance compared to existing state-of-the-art techniques.

artificial intelligence, landmark, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2507.0942

Genre: Research Report (0.70)

Industry: Aerospace & Defense (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Reconstructing dynamics from sparse observations with no training on target system

Zhai, Zheng-Meng, Huang, Jun-Yin, Stern, Benjamin D., Lai, Ying-Cheng

arXiv.org Artificial IntelligenceOct-28-2024

In applications, an anticipated situation is where the system of interest has never been encountered before and sparse observations can be made only once. Can the dynamics be faithfully reconstructed from the limited observations without any training data? This problem defies any known traditional methods of nonlinear time-series analysis as well as existing machine-learning methods that typically require extensive data from the target system for training. We address this challenge by developing a hybrid transformer and reservoir-computing machine-learning scheme. The key idea is that, for a complex and nonlinear target system, the training of the transformer can be conducted not using any data from the target system, but with essentially unlimited synthetic data from known chaotic systems. The trained transformer is then tested with the sparse data from the target system. The output of the transformer is further fed into a reservoir computer for predicting the long-term dynamics or the attractor of the target system. The power of the proposed hybrid machine-learning framework is demonstrated using a large number of prototypical nonlinear dynamical systems, with high reconstruction accuracy even when the available data is only 20% of that required to faithfully represent the dynamical behavior of the underlying system. The framework provides a paradigm of reconstructing complex and nonlinear dynamics in the extreme situation where training data does not exist and the observations are random and sparse.

artificial intelligence, machine learning, transformer, (17 more...)

arXiv.org Artificial Intelligence

2410.21222

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.14)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.63)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Novel Framework for Analyzing Structural Transformation in Data-Constrained Economies Using Bayesian Modeling and Machine Learning

Katende, Ronald

arXiv.org Machine LearningSep-25-2024

Structural transformation, the shift from agrarian economies to more diversified industrial and service-based systems, is a key driver of economic development. However, in low- and middle-income countries (LMICs), data scarcity and unreliability hinder accurate assessments of this process. This paper presents a novel statistical framework designed to address these challenges by integrating Bayesian hierarchical modeling, machine learning-based data imputation, and factor analysis. The framework is specifically tailored for conditions of data sparsity and is capable of providing robust insights into sectoral productivity and employment shifts across diverse economies. By utilizing Bayesian models, uncertainties in data are effectively managed, while machine learning techniques impute missing data points, ensuring the integrity of the analysis. Factor analysis reduces the dimensionality of complex datasets, distilling them into core economic structures. The proposed framework has been validated through extensive simulations, demonstrating its ability to predict structural changes even when up to 60\% of data is missing. This approach offers policymakers and researchers a valuable tool for making informed decisions in environments where data quality is limited, contributing to the broader understanding of economic development in LMICs.

sparse data, structural transformation, transformation, (12 more...)

arXiv.org Machine Learning

2409.16738

Country:

Africa > Sub-Saharan Africa (0.05)
Africa > Nigeria (0.04)
Africa > Kenya (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Banking & Finance > Economy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Enhancing Startup Success Predictions in Venture Capital: A GraphRAG Augmented Multivariate Time Series Method

Gao, Zitian, Xiao, Yihao

arXiv.org Artificial IntelligenceAug-21-2024

In the Venture Capital(VC) industry, predicting the success of startups is challenging due to limited financial data and the need for subjective revenue forecasts. Previous methods based on time series analysis or deep learning often fall short as they fail to incorporate crucial inter-company relationships such as competition and collaboration. Regarding the issues, we propose a novel approach using GrahphRAG augmented time series model. With GraphRAG, time series predictive methods are enhanced by integrating these vital relationships into the analysis framework, allowing for a more dynamic understanding of the startup ecosystem in venture capital. Our experimental results demonstrate that our model significantly outperforms previous models in startup success predictions. To the best of our knowledge, our work is the first application work of GraphRAG.

dataset, graphrag, startup, (14 more...)

arXiv.org Artificial Intelligence

2408.0942

Country:

Europe > Netherlands > South Holland > Leiden (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Banking & Finance > Trading (1.00)
Banking & Finance > Capital Markets (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

APS-USCT: Ultrasound Computed Tomography on Sparse Data via AI-Physic Synergy

Sheng, Yi, Wang, Hanchen, Liu, Yipei, Yang, Junhuan, Jiang, Weiwen, Lin, Youzuo, Yang, Lei

arXiv.org Artificial IntelligenceJul-18-2024

Ultrasound computed tomography (USCT) is a promising technique that achieves superior medical imaging reconstruction resolution by fully leveraging waveform information, outperforming conventional ultrasound methods. Despite its advantages, high-quality USCT reconstruction relies on extensive data acquisition by a large number of transducers, leading to increased costs, computational demands, extended patient scanning times, and manufacturing complexities. To mitigate these issues, we propose a new USCT method called APS-USCT, which facilitates imaging with sparse data, substantially reducing dependence on high-cost dense data acquisition. Our APS-USCT method consists of two primary components: APS-wave and APS-FWI. The APS-wave component, an encoder-decoder system, preprocesses the waveform data, converting sparse data into dense waveforms to augment sample density prior to reconstruction. The APS-FWI component, utilizing the InversionNet, directly reconstructs the speed of sound (SOS) from the ultrasound waveform data. We further improve the model's performance by incorporating Squeeze-and-Excitation (SE) Blocks and source encoding techniques. Testing our method on a breast cancer dataset yielded promising results. It demonstrated outstanding performance with an average Structural Similarity Index (SSIM) of 0.8431. Notably, over 82% of samples achieved an SSIM above 0.8, with nearly 61% exceeding 0.85, highlighting the significant potential of our approach in improving USCT image reconstruction by efficiently utilizing sparse data.

aps-usct, dense waveform, waveform, (14 more...)

arXiv.org Artificial Intelligence

2407.14564

Country:

North America > United States > North Carolina (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

The Sparse Tsetlin Machine: Sparse Representation with Active Literals

Østby, Sebastian, Brambo, Tobias M., Glimsdal, Sondre

arXiv.org Artificial IntelligenceMay-11-2024

This paper introduces the Sparse Tsetlin Machine (STM), a novel Tsetlin Machine (TM) that processes sparse data efficiently. Traditionally, the TM does not consider data characteristics such as sparsity, commonly seen in NLP applications and other bag-of-word-based representations. Consequently, a TM must initialize, store, and process a significant number of zero values, resulting in excessive memory usage and computational time. Previous attempts at creating a sparse TM have predominantly been unsuccessful, primarily due to their inability to identify which literals are sufficient for TM training. By introducing Active Literals (AL), the STM can focus exclusively on literals that actively contribute to the current data representation, significantly decreasing memory footprint and computational time while demonstrating competitive classification performance.

dataset, representation, stm, (14 more...)

arXiv.org Artificial Intelligence

2405.02375

Country: Europe > Norway (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology (0.69)
Media (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.47)

Add feedback